fix: replace regex HTML pre-scan with linear scanner#49
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #49 +/- ##
=========================================
Coverage 100.00% 100.00%
=========================================
Files 7 7
Lines 80 98 +18
Branches 11 16 +5
=========================================
+ Hits 80 98 +18 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: b06a7ac421
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| @@ -1,10 +1,43 @@ | |||
| import { extractMatchRangeList } from './utils'; | |||
| import { MatchRange } from './utils'; | |||
There was a problem hiding this comment.
Fix unresolved MatchRange import
useCheckIsHtmlTag.ts now imports MatchRange from ./utils, but utils.ts does not export that symbol. This introduces a real TypeScript error (TS2305) during declaration generation (reproducible via pnpm --filter text-vide build), so any strict type-checking pipeline or downstream source type-check will fail until this type is exported or defined/imported correctly as a type.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Fixed in 9a91465. MatchRange is now exported from ./utils as export type MatchRange = [number, number], and extractMatchRangeList's return type is annotated to match. Verified locally: pnpm --filter text-vide build now completes declaration emit (vite-plugin-dts) without TS2305.
| if (closeIndex === -1) { | ||
| break; |
There was a problem hiding this comment.
Continue scanning after unterminated HTML comment opener
Breaking out of the scanner when <!-- has no matching --> causes all later tags to be ignored by the pre-scan, so words inside subsequent real tags can be incorrectly highlighted. In this commit, inputs like "x <!-- unterminated <div>ab</div> yy <i>cd</i>" now produce no tag ranges at all, whereas the previous implementation still identified later tag-like ranges; this is a behavior regression for malformed-but-encountered text inputs.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Fixed in 9a91465. When <!-- has no matching -->, the scanner no longer breaks; it falls through to the normal <...> handler so the < is treated as a regular tag opener. This matches the original /<!--[^]*?-->|<[^>]+>/g behavior, where an unterminated comment fell back to the <[^>]+> alternative.
Concrete check on the example from this review (x <!-- unterminated <div>ab</div> yy <i>cd</i>):
- Before this commit:
getHtmlTagRangeList()returned[], so all words got highlighted, including those inside<div>/<i>. - After this commit: ranges cover
<!-- unterminated <div>,</div>,<i>,</i>— identical to the previous regex output.
Three regression tests were added covering this case (multi-tag after unterminated <!--), lone-<!--, and empty <>. Patch coverage is back to 100%.
- Export MatchRange from utils so useCheckIsHtmlTag's import resolves (fixes TS2305 during vite-plugin-dts declaration emit). - Fall through to normal <...> scan when <!-- has no matching --> instead of aborting the loop, restoring the original regex behavior. - Add regression tests for unterminated comments, lone <!--, and empty <>.
Motivation
<characters or unterminated<!--comments, creating a CPU DoS risk whenignoreHtmlTagis enabled by default.Description
matchAll/regex pre-scan inuseCheckIsHtmlTagwith a lineargetHtmlTagRangeListscanner that usesindexOfto find<,>, and-->boundaries.<!-- ... -->comments and normal<...>tags.useCheckIsHtmlTagand by producing the same range-list shape for downstream checks. In particular, an unterminated<!--falls through to normal<...>scanning so subsequent tags are still recognized — matching the original/<!--[^]*?-->|<[^>]+>/gbehavior.MatchRangetype alias fromutilsand annotateextractMatchRangeList's return type so the declaration build resolves the symbol.Follow-up fixes (9a91465)
Addresses the two findings from the Codex review on b06a7ac:
MatchRangeimport:utils.tsdid not export the type, which causedTS2305duringvite-plugin-dtsdeclaration emit. Now exported asexport type MatchRange = [number, number], andextractMatchRangeList's signature is annotated to match.<!--regression: the initial scannerbreak'd on a missing-->, dropping every later tag. The scanner now falls through to the normal<...>handler so the<is treated as a regular tag opener, restoring the previous regex behavior.Testing
pnpm --filter text-vide test— 51 tests passed (47 → 51).pnpm --filter text-vide build— declaration emit succeeds with no TS errors.<!--<!--<!--opener with no further tags<>is ignoredCodex Task